![]() Overhead mounting device for augmented reality perception (Machine-translation by Google Translate,
专利摘要:
Mounting device on the head for perception of augmented reality. The invention relates to a device comprising mounting means on the user's head; at least one screen; at least one microphone; at least one speaker; and a data processing module, said module comprising sub-modules for sound and image processing, and data output, and optionally positioning and orientation, and responsible for, among others, the correlation of at least one of the received sounds with a received image corresponding to the physical entity that emits said sound and the elimination of the sound together with the image corresponding to the physical entity, as well as, alternatively, the elimination of a correlated image or sound if the sound is eliminated or the corresponding image, respectively. The invention also relates to a method of generating an augmented reality environment by the referred device. (Machine-translation by Google Translate, not legally binding) 公开号:ES2639862A1 申请号:ES201700463 申请日:2017-03-31 公开日:2017-10-30 发明作者:Gonzalo Pascual RAMOS JIMÉNEZ 申请人:Universidad de Malaga; IPC主号:
专利说明:
5 10 fifteen twenty 25 30 35 40 Four. Five fifty DESCRIPTION Mounting device on the head for perception of augmented reality. Field of the Invention The present invention relates to a mounting device on the head of a user for the perception of augmented reality. Specifically, it is intended that the user partially obtains a perception of reality but with certain modifications as explained in greater detail below. Background of the invention Various mounting devices on the head of a user for virtual reality are known in which the user has a visual perception of a scenario completely different from reality. In addition, various augmented reality devices are known that mainly refer to glasses that overlap certain images on reality to provide the user with a mixed experience between reality and virtuality. In accordance with the foregoing, and in the context of the present invention, evolutions or variants with respect to the concept of augmented reality will be understood to be comprised within the concept of augmented reality (such as mixed reality). (by computer; translation of the English expression "(computer-) mediated reality"), substitutional reality (translation of the English expression "substitutional reality"), or integrated reality (translation of the English expression "integrated reality") , for example. However, prior art devices have focused exclusively on the visual aspect, leaving aside, for example, the sound, so the augmented reality experience is incomplete. Description of the invention The present invention discloses a mounting device on the head for augmented reality perception comprising an image collection module, a sound collection module, a data processing module and an output module comprising a screen and speakers. In particular, the device of the present invention is characterized in that the data processing module has various modules for the treatment of sound in order to give the user a complete experience that is not only based on visual perception but does not matter importance to auditory perception. Specifically, the present invention discloses a mounting device on the head of a user for the perception of augmented reality comprising: • mounting means on the user's head; • at least one screen; • at least one microphone; • at least one speaker; Y 5 10 fifteen twenty 25 30 35 40 Four. Five fifty • a data processing module, said module comprising in turn an image processing sub-module, a sound processing sub-module and a data output sub-module, the sub-module comprising the treatment of sounds means of decomposition of the sounds received by means of the at least one microphone in a series of spectrograms and correlation means of said spectrograms with spectrograms of a library in order to identify which physical entity produces the sound. Preferably, the correlation of spectrograms generated from the sounds received with spectrograms from a library is carried out by at least one of the following techniques: Template Matching, SVM, Deep Learning, and / or neural networks, for example multilayer perceptron MLP. On the other hand, the correlation of the spectrograms generated from the received sounds may comprise, for example, the generation of images of the received spectrograms, the library also comprising spectrogram images. In this way, the correlation and analysis of the spectrograms is not performed as a function of comparisons between six data, but comparison of images. The data output sub-module of the device of the present invention may comprise means of eliminating some sounds whose spectrograms have been identified. Analogously, the data output sub-module can also comprise means for adding sounds, for example, sounds stored in the database. Preferably, the data output sub-module comprises means for selecting spectrograms, means for selecting images and means for transmitting selected images and / or spectrograms to at least one screen and / or speaker. The data output sub-module may be provided with an input from the image processing sub-module, an input from the sound processing sub-module and has correlation means of at least one of the sub's inputs -Image processing module with at least one of the inputs of the sound processing sub-module. On the other hand, the data output sub-module may have means of elimination of a correlated image if the sound corresponding to said image is eliminated. Similarly, the data output sub-module may have means of eliminating a correlated sound if the image corresponding to said sound is removed. Optionally, the device object of the present invention has positioning means, for example, by means of a GNSS system (acronym of the English expression '' Global Navigation Satellite System '', global satellite navigation system), such as GPS, GLONASS , Galileo, etc. Similarly, the device may have means to determine the orientation of the device, such as accelerometers, It is another objective of the present invention to disclose a method of generating an augmented reality environment by means of a device comprising • mounting means on the user's head; • at least one screen; 5 10 fifteen twenty 25 30 35 40 Four. Five fifty at least one microphone; • at least one speaker; Y • a data processing module; which comprises the stages of: I. obtaining images through at least one camera; II. obtaining sounds through at least one microphone; III. data processing which in turn comprises the correlation of at least one of the sounds received with an image received corresponding to the physical entity that emits said sound and the elimination of the sound together with the image corresponding to the physical entity; Y IV. reproduction through the at least one screen and / or at least one speaker of the images and / or sounds that have not been eliminated in stage III. For the correlation of the received sounds, a series of spectrograms corresponding to the received sounds can be previously obtained and / or some comparison of the received sounds can be made with a sound database. Preferably, said sound database is in a memory of the device although, alternatively, the sound database can be found on a server remote from the device, for example, in the cloud. Additionally, the correlation of the received sounds with an received image may include processing by artificial intelligence techniques such as, for example, Template Matching, SVM, Deep Learning, and / or neural networks, for example multilayer perceptron MLP. More preferably, stage IV comprises reproduction through the at least one screen and / or at least one speaker of at least one image and / or sound stored in the database and additional to those obtained in stage l . In addition to the correlation of sounds and images included in stage III of data processing, said stage III may comprise the elimination of a correlated image if the sound corresponding to said image is eliminated, as well! such as the removal of a correlated sound if the image corresponding to that sound is deleted. Brief description of the drawings The attached figures show, in an illustrative and non-limiting manner, two examples of system realization according to the present invention, in which: - Figure 1 is an example of a device according to the present invention. - Figure 2 is a flow chart of the operation of a device according to the present invention, emphasizing the sound processing sub-module of the data processing module and the interaction of said sub-module and sub-modules. Image processing and data output. 5 10 fifteen twenty 25 30 35 40 Four. Five fifty Detailed description of an embodiment Figure 1 shows a device according to the present invention. This figure shows the main parts of the device (1) that are at least one camera (2) located, for example, in the front of the device in order to obtain the images of a specific point that the user is looking at , at least one microphone (3) to capture ambient sounds and, with respect to the output to the user, has at least one screen (4) and at least one speaker (5). Alternatively, the device may comprise cameras on the sides (for example, one on each side) and / or on the back of the device in order to allow image capture before the user changes the orientation of the device. The device has mounting means on the user's head that can be, for example, a pair of pins for mounting on the user's head or a section of a helmet that covers at least partially the head and, preferably, the ears of the user and have the speakers in the vicinity of said user's ears, as shown in figure 1. Since the objective of the present invention is to present the user with a modified reality, it is conceived that, in a particularly preferred aspect of the present invention, means are provided for the user to see the images selected by the device and also listen to the selected sounds. Consequently, it is contemplated that the at least one screen (4) comprised in the device object of the present invention does not allow the user to be seen through it (that is, it is not transparent), being preferably of the NED type (acronym for English of the expression "Near Eye Display" (screen close to the eye), as well as, additionally, in an embodiment with two speakers (5), these have the form of headphones "on the ear" type so that they have noise reduction means to avoid hearing outside noises that may interfere with user-perceivable information. According to the present invention, sounds and / or images of the user's environment are captured. Subsequently, through data processing means, which of the captured images and / or sounds will finally be sent to the user is selected. Additionally, the possibility of adding sounds and / or images from a database that are not in the user's environment is contemplated. Additionally, the device of Figure 1 has positioning and orientation means (6). In this way, the position of the device with respect to an axis of coordinates X, Y, Z as well as the orientation of the device is available. As for the data processing, the device has a data processing module, preferably in real time. The data that the device must manage are mainly: images, sounds, position and orientation of the device and data output. Consequently, the data processing module has sub-modules to process each of these types of data. Sub-module for image processing (40) The objective of the sub-module of images (40) is, on the one hand, to obtain a series of images, specifically images of what the user sees if he does not have the device as well as its surroundings. For this the sub-module of image processing has 5 10 fifteen twenty 25 30 35 40 Four. Five fifty input of images obtained by cameras and means of processing said images in order to adapt them to screens, preferably NED type. Preferably, the image processing sub-module may also have a series of object libraries in order, among others, to be able to identify the objects that the user is observing and to be able to classify them into a series of known objects. In short, images of objects are available in the database and identification information of each object is available (for example, a reference or its name). In this way, when capturing the image of an object, it is compared with the database if there are similar objects and, if there is a match with an object in the database, the object is classified according to the identifying information of the same. In addition, this identification of objects can provide the device with the ability to provide more information to the user by indicating, for example by text, relevant information regarding the objects that he is observing. Another possible use of the identification of objects is the possibility of eliminating the image of real objects of the information perceptible by the user in the data output sub-module described below; Thus, the user can be prevented from accessing certain images. In line with this possible use, a preferred embodiment of the invention comprises not only the elimination of said images of real objects from said information perceptible by the user, but also the inclusion of images of non-real (virtual) objects, stored and available to from object libraries such as those referred to above, in said information perceptible by the user, as well as, where appropriate, their subsequent elimination. In an exemplary embodiment, the user observes a machine in his environment. First, the image of the machine is captured and the database is consulted to locate similar images. The images in the database are related to at least one field referring to identifying information, for example, which is an object made by man. Consequently, the device proceeds to mark the image obtained with the information that it is an object made by man. After identifying the image, the device can be configured, for example, to eliminate all the objects made by man so that, in the output sub-module, said image will be deleted. Additionally, in the output sub-module, said image could be replaced by adding, for example, a plant in place in order to prevent the user from tripping when moving using the device. Sub-module of sound treatment (20) The sub-module of sound processing (20) includes the captation of the incoming sound that is preferably picked up through at least one microphone (3) of the device. For the capture of ambient sound, the device may comprise two or more microphones (3) arranged, for example, on the sides of the device. Alternatively, said microphones can also detect the direction of the received sound, in the case of directional microphones. Alternatively, the data processing means may comprise sound processing means to identify an estimated position of the origin of the sound. Once the sound is captured, the noise is filtered (21). Noise filtering can be done by any of the known techniques and 5 10 fifteen twenty 25 30 35 40 Four. Five fifty included in the state of the art, such as the use of a Wiener filter, alternatively, the present invention also contemplates the use of Artificial Intelligence (AI) techniques similar to those that will be used in the following phases of the processing. Another of the stages contemplated by the present invention relates to the decomposition and identification of sounds (22). The objective of this stage is to discern between the different sounds detected and, once classified, to identify what the detected sounds correspond to. The present invention contemplates the decomposition by frequencies of the sound in order to have for each sound a spectrogram comprising at least the frequency and intensity. Once the decomposition of each sound in frequencies and intensities is available, artificial intelligence algorithms are used. Specifically, the decomposition of and identification of sounds (22) is carried out based on the frequency and intensity spectrogram, and through different Machine Learning techniques, such as Deep Learning, Template Matching, SVM (acronym for English expression) "Support Vector Machines"), and other types of neural networks. Once the spectrogram is available, the incoming sounds are classified and differentiated from each other. Said techniques will use a sound database, alternatively, said sound database may be a sound base accessible via the internet. The mentioned techniques are complementary, so that combinations between them are also possible to perform the aforementioned decomposition and identification. In order to classify sounds, using Template Matching it is possible to identify how similar are two spectrograms of data. Consequently, the received sounds can be compared with a sound database (26), said sound database being stored on the device or, alternatively, on a server and accessing them, preferably in real time or almost real time . On the other hand, the SVM (acronym for the English expression "Support Vector Machines") indicates that there is a probability that each incoming sound corresponds to one of those stored in the sound database (26). For this, the sound vectors are generated, from the spectrogram, and compared with the available vectors. Neural networks (e.g. multilayer perceptron - MLP) also work with vectors and can be used as a unique identification mechanism or in combination with other techniques to complement the information. In addition, although its training is usually slower, its application to classify is usually very fast, which suits us in order to work in real time. In the case of neural networks, for their training, emphasis can be placed on different sound properties, in particular, the use of the frequency-intensity pair is particularly advantageous to identify what each sound corresponds to. Alternatively, before an unknown frequency-intensity pair, the user can be asked what sound it is for the user to identify and the neural network to have continuous learning, once the user has identified a new sound it can be incorporated into the database of sounds (26) or improve the algorithm in case it was already stored by training the neural network. 5 10 fifteen twenty 25 30 35 40 Four. Five fifty In a particularly preferably realization, the technique to identify what each captured sound corresponds to is through Deep Learning, using spectrograms as images to learn from. A noteworthy detail is that although the learning phases (and therefore training) of the aforementioned learning algorithms can last a certain time, the important thing is that the phase of application of the learned (prediction or classification) if it can be done very quickly , even in real time or almost real time. These spectrograms, subsequently, go through a weighting phase to obtain the decomposition and identification of the sounds. These results will be passed to the next phase, the elimination of sounds, but also a report of these results will be passed to the processes that control the requirements of integrated reality. This data will be processed in the data output sub-module (30) in order to determine which sounds are to be maintained, which are to be deleted (in the phase of elimination of sounds (23)) and which new ones are to be incorporated (in the phase of inclusion of sounds (24)). Returning to the example of realization on the basis of which the sub-module of image processing was explained, the microphone of the device captures the sum of a plurality of sounds found in the environment. By means of the filtering means, for example, sounds that are not interesting for processing and that can be considered as noise are eliminated. By means of the decomposition by frequencies the sounds are separated in order to obtain, for example, the sound corresponding to a machine such as a car or bird sounds. For each of these sounds a spectrogram is obtained in which the frequencies as well as the sound intensities in each of said frequencies can be identified. For at least one of said spectrograms, a search and a comparison with sounds stored in the database are performed in order to identify to which object the spectrogram corresponds. Returning to the previous example, it will have been identified that one sound corresponds to a car and another sound corresponds to a bird. Device positioning and orientation sub-module The device object of the present invention has positioning means, for example, by means of a GNSS system (acronym of the English expression "Global Navigation Satellite System", global satellite navigation system), such as GPS, GLONASS, Galileo, etc. Additionally, the system has means to determine the orientation of the device, in particular by means of accelerometers. Alternatively, inertia measurement units (IMU) or other types of more complex three-dimensional orientation sensors such as AHRS could be used. "Attitude and Heading Reference System"). In a preferred embodiment of the invention, the device can determine the position (X, Y, Z) by means of a geo positioning system (for example, a GNSS system). Additionally, the device can know its orientation (direction, elevation, warping angle) by at least one accelerometer. 5 10 fifteen twenty 25 30 35 In an example of realization of the present invention, the system has a geo-referenced virtual map, so that the system has positional information of the user (by means of said positioning means) and emits through the output module at least partially images of said geo-referenced virtual map. Data output sub-module (30) The input data to the data output sub-module (30) comprise at least some images previously identified in the image processing sub-module and sounds previously identified in the sound treatment sub-module. In the sub-module of data output, the images captured are correlated with the sounds captured by means of the identification information obtained from the databases, although the possible management of images and sounds that do not need or for which no it is desired to make any correlation with sounds or images, respectively. For example, returning to the previous examples, through the cameras an image has been obtained which, after the processing of said image in the sub-module of image processing has been identified that corresponds to a machine. On the other hand, the sound processing sub-module has identified that there is a sound corresponding to a car and a sound corresponding to a bird. The output sub-module analyzes the identifying references and correlates the spectrogram corresponding to the car with the image captured from the machine so that, if the system requirements require the elimination of the machine, the output sub-module not only eliminates the machine image but the corresponding sound to it. Alternatively, the output sub-module may include the image of a bird (for example, obtaining it from the database) since it has identified that there is a sound that corresponds to it. Finally, the output sub-module has means of communication with the at least one screen (4), and the at least one speaker (5) in order to send the determined images and / or sounds.
权利要求:
Claims (25) [1] 5 10 fifteen twenty 25 30 35 40 Four. Five fifty 1. Mounting device on the head of a user for the perception of augmented reality comprising: • mounting means on the user's head; • at least one screen; • at least one microphone; • at least one speaker; Y • a data processing module, said module comprising, in turn, an Image processing sub-module, a sound processing sub-module and a data output sub-module, characterized in that the sub-module of sound processing comprises means of decomposition of the sounds received by means of the at least one microphone in a series of spectrograms and correlation means of said spectrograms with spectrograms of a library in order to identify which physical entity produces Sound. [2] 2. Device according to claim 1 characterized in that the correlation of the spectrograms generated from the sounds received with spectrograms from a library is performed by Template Matching. [3] 3. Device according to claim 1 characterized in that the correlation of the spectrograms generated from the sounds received with spectrograms from a library is performed by neural networks. [4] 4. Device according to claim 1 characterized in that the correlation of the spectrograms generated from the sounds received with spectrograms from a library is performed by SVM. [5] 5. Device according to claim 1 characterized in that the correlation of the spectrograms generated from the sounds received with spectrograms from a library is performed by multilayer perceptron MLP. [6] 6. Device according to claim 1 characterized in that the correlation of the spectrograms generated from the sounds received with spectrograms from a library is performed by Deep Learning. [7] 7. Device according to any of the preceding claims characterized in that the correlation of the spectrograms generated from the received sounds comprises the generation of images of the received spectrograms and, in addition, the library comprises specimens of spectrograms. [8] 8. Device according to any of the preceding claims characterized in that the data output sub-module comprises means for elimination of some sounds whose spectrograms have been identified. [9] 9. Device according to any of the preceding claims characterized in that the data output sub-module comprises means for adding sounds. 5 10 fifteen twenty 25 30 35 40 Four. Five fifty [10] Device according to any one of the preceding claims characterized in that the data output sub-module comprises means for selecting spectrograms, means for selecting images and means for transmitting selected images and / or spectrograms to at least one screen and / or speaker [11] 11. Device according to any of the preceding claims characterized in that the data output sub-module has an input from the image processing sub-module, an input from the sound processing sub-module and has correlation means of at least one of the inputs of the image processing sub-module with at least one of the inputs of the sound processing sub-module. [12] 12. Device according to claim 11 characterized in that the data output sub-module has means of eliminating a correlated image if the sound corresponding to said image is eliminated. [13] 13. Device according to claim 11 characterized in that the data output sub-module has means of eliminating a correlated sound if the image corresponding to said sound is eliminated. [14] 14. Device according to any one of claims 1 to 13 characterized in that the data processing module further comprises a sub-module for positioning and orientation of the device. [15] 15. Generation method of an augmented reality environment in a device according to any of claims 1 to 14 characterized in that it comprises the steps of: I. obtaining images through at least one camera; II. obtaining sounds through at least one microphone; III. data processing which in turn comprises the correlation of at least one of the sounds received with an image received corresponding to the physical entity that emits said sound and the elimination of the sound together with the image corresponding to the physical entity; Y IV. reproduction through the at least one screen and / or at least one speaker of the images and / or sounds that have not been eliminated in stage III. [16] 16. Method according to claim 15 characterized in that for the correlation of the received sounds a series of spectrograms corresponding to the received sounds are previously obtained. [17] 17. Method according to any of claims 15 or 16 characterized in that for the correlation of the received sounds it comprises a comparison with a sound database. [18] 18. Method according to claim 17 characterized in that the sound database is located on a remote server to the device. [19] 19. Method according to any of claims 15 to 18, characterized in that the correlation of the received sounds with an received image comprises the processing by neural networks. [20] 20. Method according to any of claims 15 to 18 characterized in that the correlation of the received sounds with a received image comprise the SVM processing. 21. Method according to any of claims 15 to 18 characterized in that the correlation of the received sounds with a received image comprise the multilayer perceptron processing. [22] 22. Method according to any of claims 15 to 18 characterized in that the 10 correlation of the received sounds with a received image comprise the Deep Learning processing. [23] 23. Method according to any of claims 15 to 22, characterized in that step IV comprises reproduction through the at least one screen and / or the 15 minus a speaker of at least one image and / or a sound stored in the database in addition to those obtained in step l. [24] 24. Method according to any of claims 15 to 23 characterized in that step III comprises the elimination of an image correlated with a sound if 20 removes the sound corresponding to that image. [25] 25. Method according to any of claims 15 to 24 characterized in that step III comprises the elimination of a sound correlated with an image if the image corresponding to said sound is eliminated. 25 [26] 26. Method according to any of claims 15 to 25 characterized in that it comprises the incorporation or association of positioning data and orientation to the data processed in step III.
类似技术:
公开号 | 公开日 | 专利标题 JP2020095748A|2020-06-18|Detection for visual inattention based on eye convergence US10111013B2|2018-10-23|Devices and methods for the visualization and localization of sound JP6521604B2|2019-05-29|Modify audio panoramas to indicate that there is a danger or other event of interest US20130250078A1|2013-09-26|Visual aid KR20170095834A|2017-08-23|System and method for immersive and interactive multimedia generation US10038966B1|2018-07-31|Head-related transfer function | personalization based on captured images of user CN106127788B|2019-10-25|A kind of vision barrier-avoiding method and device Tapu et al.2014|A survey on wearable devices used to assist the visual impaired user navigation in outdoor environments US20140010391A1|2014-01-09|Amplifying audio-visiual data based on user's head orientation CN106416292A|2017-02-15|Methods circuits devices systems and associated computer executable code for acquiring acoustic signals CN108293103A|2018-07-17|Enliven spokesman's position detection CN107771342B|2020-12-15|Augmented reality display method and head-mounted display equipment ES2639862B1|2018-09-10|Mounting device on the head for perception of augmented reality JP2017033334A|2017-02-09|Head-mounted display, data output method, and program for head-mounted display CN110673819A|2020-01-10|Information processing method and electronic equipment US20190354175A1|2019-11-21|Eye Enrollment For Head-Mounted Enclosure CN107334609B|2019-08-27|A kind of system and method playing audio-frequency information to target object FR3038101A1|2016-12-30|METHOD FOR GUIDING AN INDIVIDUAL AND NAVIGATION SYSTEM CN113228029A|2021-08-06|Natural language translation in AR EP3882894B1|2022-01-19|Seeing aid for a visually impaired individual Nguyen et al.2018|A vision aid for the visually impaired using commodity dual-rear-camera smartphones US10694567B1|2020-06-23|Systems and methods for establishing a data connection WO2021112161A1|2021-06-10|Information processing device, control method, and non-transitory computer-readable medium WO2021060539A1|2021-04-01|Ranging device, ranging method, program, electronic apparatus, learning model generation method, manufacturing method, and depth map generation method ES2517765A1|2014-11-03|Device and method of spatial analysis, storage and representation by means of sounds |
同族专利:
公开号 | 公开日 ES2639862B1|2018-09-10|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题 US4952931A|1987-01-27|1990-08-28|Serageldin Ahmedelhadi Y|Signal adaptive processor| US7676372B1|1999-02-16|2010-03-09|Yugen Kaisha Gm&M|Prosthetic hearing device that transforms a detected speech into a speech of a speech form assistive in understanding the semantic meaning in the detected speech| US20150279109A1|2010-12-22|2015-10-01|Intel Corporation|Object mapping techniques for mobile augmented reality applications|
法律状态:
2018-09-10| FG2A| Definitive protection|Ref document number: 2639862 Country of ref document: ES Kind code of ref document: B1 Effective date: 20180910 |
优先权:
[返回顶部]
申请号 | 申请日 | 专利标题 ES201700463A|ES2639862B1|2017-03-31|2017-03-31|Mounting device on the head for perception of augmented reality|ES201700463A| ES2639862B1|2017-03-31|2017-03-31|Mounting device on the head for perception of augmented reality| 相关专利
Sulfonates, polymers, resist compositions and patterning process
Washing machine
Washing machine
Device for fixture finishing and tension adjusting of membrane
Structure for Equipping Band in a Plane Cathode Ray Tube
Process for preparation of 7 alpha-carboxyl 9, 11-epoxy steroids and intermediates useful therein an
国家/地区
|